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Abstract — This paper provides performance bounds for com- 
pressed sensing in the presence of Poisson noise using expander 
graphs. The Poisson noise model is appropriate for a variety of 
applications, including low-light imaging and digital streaming, 
where the signal-independent and/or bounded noise models used 
in the compressed sensing literature are no longer applicable. 
In this paper, we develop a novel sensing paradigm based on 
expander graphs and propose a MAP algorithm for recovering 
sparse or compressible signals from Poisson observations. The 
geometry of the expander graphs and the positivity of the 
corresponding sensing matrices play a crucial role in estab- 
lishing the bounds on the signal reconstruction error of the 
proposed algorithm. We support our results with experimental 
demonstrations of reconstructing average packet arrival rates 
and instantaneous packet counts at a router in a communication 
network, where the arrivals of packets in each flow follow a 
Poisson process. 



Index Terms — compressive measurement, expander graphs, 
RIP-1, photon-limited imaging, packet counters 



I. Introduction 

The goal of compressive sampling or compressed sensing 
(CS) (TJ, |2) is to replace conventional sampling by a more 
efficient data acquisition framework, which generally requires 
fewer sensing resources. This paradigm is particularly enticing 
whenever the measurement process is costly or constrained in 
some sense. For example, in the context of photon-limited 
applications (such as low-light imaging), the photomultiplier 
tubes used within sensor arrays are physically large and ex- 
pensive. Similarly, when measuring network traffic flows, the 
high-speed memory used in packet counters is cost-prohibitive. 
These problems appear ripe for the application of CS. 
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However, photon-limited measurements [3| and 
arrivals/departures of packets at a router [4] are commonly 
modeled with a Poisson probability distribution, posing 
significant theoretical and practical challenges in the context 
of CS. One of the key challenges is the fact that the 
measurement error variance scales with the true intensity of 
each measurement, so that we cannot assume constant noise 
variance across the collection of measurements. Furthermore, 
the measurements, the underlying true intensities, and the 
system models are all subject to certain physical constraints, 
which play a significant role in performance. 

Recent works [5]-[8 1 explore methods for CS reconstruction 
in the presence of impulsive, sparse or exponential-family 
noise, but do not account for the physical constraints as- 
sociated with a typical Poisson setup and do not contain 
the related performance bounds emphasized in this paper. 
In previous work (9), [ |T0| , we showed that a Poisson noise 
model combined with conventional dense CS sensing matri- 
ces (properly scaled) yielded performance bounds that were 
somewhat sobering relative to bounds typically found in the 
literature. In particular, we found that if the number of photons 
(or packets) available to sense were held constant, and if 
the number of measurements, m, was above some critical 
threshold, then larger m in general led to larger bounds on 
the error between the true and the estimated signals. This can 
intuitively be understood as resulting from the fact that dense 
CS measurements in the Poisson case cannot be zero-mean, 
and the DC offset used to ensure physical feasibility adversely 
impacts the noise variance. 

The approach considered in this paper hinges, like most 
CS methods, on reconstructing a signal from compressive 
measurements by optimizing a sparsity-regularized goodness- 
of-fit objective function. In contrast to many CS approaches, 
however, we measure the fit of an estimate to the data using 
the Poisson log-likelihood instead of a squared error term. 
This paper demonstrates that the bounds developed in previous 
work can be improved for some sparsity models by considering 
alternatives to dense sensing matrices with random entries. 
In particular, we show that deterministic sensing matrices 
given by scaled adjacency matrices of expander graphs have 
important theoretical characteristics (especially an l\ version 
of the restricted isometry property [ 1 1 1) that are ideally suited 
to controlling the performance of Poisson CS. 

Formally, suppose we have a signal 0* £ M™ with known 
i\ norm ||#*||i (or a known upper bound on ||0*||i). We 
aim to find a matrix $ e M™ xn with m, the number 



of measurements, as small as possible, so that 9* can be 
recovered efficiently from the measured vector y £ M™, 
which is related to $9* through a Poisson observation model. 
The restriction that elements of $ be nonnegative reflects 
the physical limitations of many sensing systems of interest 
(e.g., packet routers and counters or linear optical systems). 
The original approach employed dense random matrices fTT) , 
[12|. It has been shown that if the matrix 4> acts nearly 
isometrically on the set of all fc-sparse signals, thus obeying 
what is now referred to as the Restricted Isometry Property 
with respect to £2 norm (RIP-2) [11 1, then the recovery of 9* 
from <J>#* is indeed possible. It has been also shown that dense 
random matrices constructed from Gaussian, Bernoulli, or 
partial Fourier ensembles satisfy the required RIP-2 property 
with high probability fTT) . 

Adjacency matrices of expander graphs fl3) have been 
recently proposed as an alternative to dense random matrices 
within the compressed sensing framework, leading to compu- 
tationally efficient recovery algorithms |[T4|-|[T6|. It has been 
shown that variations of the standard recovery approaches such 
as basis pursuit [2] and matching pursuit J17) are consistent 
with the expander sensing approach and can recover the 
original sparse signal successfully [18], 1 19]. In the presence of 
Gaussian or sparse noise, random dense sensing and expander 
sensing are known to provide similar performance in terms of 
the number of measurements and recovery computation time. 
Berinde et al. proved that expander graphs with sufficiently 
large expansion are near-isometries on the set of all fc-sparse 
signals in the i\ norm; this is referred as a Restricted Isometry 
Property for £\ norm (RIP-1) (T8J. Furthermore, expander 
sensing requires less storage whenever the signal is sparse 
in the canonical basis, while random dense sensing provides 
slightly tighter recovery bounds [16|. 

The approach described in this paper consists of the follow- 
ing key elements: 

• expander sensing matrices and the RIP-1 associated with 
them; 

• a reconstruction objective function which explicitly in- 
corporates the Poisson likelihood; 

• a countable collection of candidate estimators; and 

• a penalty function defined over the collection of candi- 
dates, which satisfies the Kraft inequality and which can 
be used to promote sparse reconstructions. 

In general, the penalty function is selected to be small for 
signals of interest, which leads to theoretical guarantees that 
errors are small with high probability for such signals. In this 
paper, exploiting the RIP-1 property and the non-negativity of 
the expander-based sensing matrices, we show that, in contrast 
to random dense sensing, expander sensing empowered with 
a maximum a posteriori (MAP) algorithm can approximately 
recover the original signal in the presence of Poisson noise, 
and we prove bounds which quantify the MAP performance. 
As a result, in the presence of Poisson noise, expander graphs 
not only provide general storage advantages, but they also 
allow for efficient MAP recovery methods with performance 
guarantees comparable to the best /c-term approximation of the 
original signal. Finally, the bounds are tighter than those for 



specific dense matrices proposed by Willett and Raginsky (9), 
1 10] whenever the signal is sparse in the canonical domain, 
in that a log term in the bounds in [10] is absent from the 
bounds presented in this paper. 

A. Relationship with dense sensing matrices for Poisson CS 

In recent work, the authors established performance bounds 
for CS in the presence of Poisson noise using dense 
sensing matrices based on appropriately shifted and scaled 
Rademacher ensembles [9], [10]. Several features distinguish 
that work from the present paper: 

• The dense sensing matrices used in (9), flO) require 
more memory to store and more computational resources 
to apply to a signal in a reconstruction algorithm. The 
expander-based approach described in this paper, in con- 
trast, is more efficient. 

• The expander-based approach described in this paper 
works only when the signal of interest is sparse in the 
canonical basis. In contrast, the dense sensing matrices 
used in J9), fit)) can be applied to arbitrary sparsity bases 
(though the proof technique there needs to be altered 
slightly to accommodate sparsity in the canonical basis). 

• The bounds in both this paper and (5), flO) reflect a 
sobering tradeoff between performance and the number 
of measurements collected. In particular, more measure- 
ments (after some critical minimum number) can actually 
degrade performance as a limited number of events (e.g., 
photons) are distributed among a growing number of 
detectors, impairing the SNR of the measurements. 

B. Notation 

Nonnegative reals (respectively, integers) will be denoted 
by R + (respectively, Z + ). Given a vector u £ R™ and a set 



S C {1, 



1}, we will denote by u s the vector obtained 



by setting to zero all coordinates of u that are in S c , the 
complement of S: VI < i < n,uf = Uil{igs}- Given some 
1 < k < n, let S be the set of positions of the k largest (in 
magnitude) coordinates of u. Then u^ = u s will denote the 
best k-term approximation of u (in the canonical basis of W 1 ), 
and 



a k (u) 



,(*)|L = 



E 



will denote the resulting £1 approximation error. The £ 
quasinorm measures the number of nonzero coordinates of u: 
\\ u \\o = J2"=i l{«;#o}- For a subset S C {1, ...,n} we will 
denote by 1$ the vector with components l^ ieS j, 1 < i < n. 
Given a vector u, we will denote by u + the vector obtained by 
setting to zero all negative components of u: for all 1 < i < n, 
u~l — max{0, Ui}. Given two vectors u, v £ R™, we will write 
u y v if Ui > Vi for all 1 < i < n. If u > a^{i,...,n} for some 
a £ R, we will simply write u > a. We will write >- instead 
of >z if the inequalities are strict for all i. 

C. Organization of the paper 

This paper is organized as follows. In Section [Til we 
summarize the existing literature on expander graphs applied 



to compressed sensing and the RIP-1 property. Section [TTT] 
describes how the problem of compressed sensing with Pois- 
son noise can be formulated in a way that explicitly accounts 
for nonnegativity constraints and flux preservation (i.e., we 
cannot detect more events than have occurred); this section 
also contains our main theoretical result bounding the error 
of a sparsity penalized likelihood reconstruction of a signal 
from compressive Poisson measurements. These results are 



illustrated and further analyzed in Section IV in which we 



focus on the specific application of efficiently estimating 
packet arrival rates. Several technical discussions and proofs 
have been relegated to the appendices. 

II. Background on expander graphs 

We start by defining an unbalanced bipartite vertex- 
expander graph. 

Definition II.l. We say that a bipartite simple graph G = 
(A, B, E) with (regular) left degreaMd is a (fc, e)-expander if, 
for any S C A with \S\ < fc, the set of neighbors Af(S) of S 
has size \AT{S)\ > (1 - e)d\S\. 

Figure [T] illustrates such a graph. Intuitively a bipartite graph 
is an expander if any sufficiently small subset of its variable 
nodes has a sufficiently large neighborhood. In the CS setting, 
A (resp., B) will correspond to the components of the original 
signal (resp., its compressed representation). Hence, for a 
given \A\, a "high-quality" expander should have \B\, d, and e 
as small as possible, while fc should be as close as possible to 
\B\. The following proposition, proved using the probabilistic 
method (20), is well-known in the literature on expanders: 

Proposition II.2 (Existence of high-quality expanders). For 
any 1 < k < \ and any e £ (0, 1), there exists a (fc, e)- 

expander with left degree d = O I og ^' ' ) and right set size 

m _ Q ( fclog(n/fc) s 



Unfortunately, there is no explicit construction of expanders 



from Definition II. 1 However, it can be shown that, with high 
probability, any d-regular random graph with 



d = Q 



log(n/fc) 



and m = O 



k \og(n/k) 



satisfies the required expansion property. Moreover, the graph 
may be assumed to be right-regular as well, i.e., every node 
in B will have the same (right) degree D ]2T) . Counting the 
number of edges in two ways, we conclude that 



\E\ = \A\d=\B\D 



D = 



Q 



Thus, in practice it may suffice to use random bipartite regular 
graphs instead of expander^] Moreover, there exists an explicit 
construction for a class of expander graphs that comes very 

'That is, each node in A has the same number of neighbors in B. 

2 Briefly, we can first generate a random left-regular graph with left degree 
d (by choosing each edge independently). That graph is, with overwhelming 
probability, an expander graph. Then, given an expander graph which is only 
left-regular, a paper by Guruswami et al. [22] shows how to construct an 
expander graph with almost the same parameters, which is both left-regular 
and right-regular. 




M{S) 
\N{S)\>d{l-e)\S\ 



Fig. 1. A (fc, e) -expander. In this example, the green nodes correspond to 
A, the blue nodes correspond to B, the yellow oval corresponds to the set 
S C A, and the orange oval corresponds to the set Af(S) C B. There are 
three colliding edges. 



close to the guarantees of Proposition II. 2 This construction, 



due to Guruswami et al. [23], uses Parvaresh-Vardy codes [24 1 
and has the following guarantees: 

Proposition II.3 (Explicit construction of high-quality ex- 
panders). For any positive constant /3, and any n,k,e, there 
exists a deterministic explicit construction of a (fc, e)-expander 



graph with d = O 



log n \ & 



and m = Q{d 2 k 1+ P). 



Expanders have been recently proposed as a means of 
constructing efficient compressed sensing algorithms [15|, 
1 18 1, fl9), (22). In particular, it has been shown that any 



n-dimensional vector that is fc-sparse can be fully recovered 
using O (A: log (f )) measurements in O (nlog (f )) time [15 



(19) . It has been also shown that, even in the presence of 
noise in the measurements, if the noise vector has low l\ 
norm, expander-based algorithms can approximately recover 
any fc-sparse signal [ 16 1, fl8), fl9). One reason why expander 



graphs are good sensing candidates is that the adjacency 
matrix of any (fc,e)-expander almost preserves the t\ norm 



of any /c-sparse vector [18]. In other words, if the adjacency 
matrix of an expander is used for measurement, then the l\ 
distance between two sufficiently sparse signals is preserved 
by measurement. This property is known as the "Restricted 
Isometry Property for t\ norms" or the "RIP-1" property. 
Berinde et al. have shown that this condition is sufficient for 
sparse recovery using t-y minimization JT8). 

The precise statement of the RIP-1 property, whose proof 
can be found in | fl5) , goes as follows: 

Lemma II.4 (RIP-1 property of the expander graphs). Let F 
be the m x n adjacency matrix of a (fc, e) expander graph G. 
Then for any fc-sparse vector x £ M. n we have: 



(1 - 2e)d||a;||i < ||Fa;||i < d||x||i 



(1) 



The following proposition is a direct consequence of the 
above RIP-1 property. It states that if, for any almost fc- 
sparse vectoi^ u, there exists a vector v whose l\ norm is 
close to that of u, and if Fv approximates Fu, then v also 
approximates u. Our results of Section III exploit the fact 



By "almost sparsity" we mean that the vector has at most k significant 
entries. 



that the proposed MAP decoding algorithm outputs a vector 
satisfying the two conditions above, and hence approximately 
recovers the desired signal. 

Proposition II.5. Let F be the adjacency matrix of a (2k, e)- 
expander and u,v be two vectors in M. n , such that 

||«||i> Hli-A 
for some A > 0. Then \\u — v\\\ is upper-bounded by 



< 



l-2e 



(2ff fc (u) + A) 



\Fu-Fv\ 



l-6e v ~" v ' ' ' ' d(l — 6e) ' 

In particular, if we let e = 1/16, then we get the bound 



v\\i < 4/r k (u) + -\\Fu 
a 



Fv\ 



2A. 



Proof: See Appendix [B] ■ 

For future convenience, we will introduce the following 

piece of notation. Given n and 1 < k < n/4, we will 

denote by Gk.n a (2k, l/16)-expander with left set size 



n whose existence is guaranteed by Proposition II. 2 Then 

G Kn = (A, B, E) has 

|A| = n, \B\ =m= 0(klog(n/k)), d = 0(\og(n/k)). 

III. Compressed sensing in the presence of Poisson 

Noise 

A. Problem statement 

We wish to recover an unknown vector 8* £ R™ of Poisson 
intensities from a measured vector y £ Z™, sensed according 
to the Poisson model 



y ~ Poisson(<P0*), 



(2) 



where $ £ R™ x ™ is a positivity -preserving sensing matrix 
That is, for each j £ {1, . . . , m}, yj is sampled independently 
from a Poisson distribution with mean ($#*).,: 



p*^(?/)=n p (*«* )M> 

3=1 

where, for any z £ Z + and A £ R+, we have 

A* 



v(*) 



3~ A if A > 
l{z=o} otherwise 
where the A = case is a consequence of the fact that 



(3) 



(4) 



lim — r e = l( y= 



{z=0}- 



We assume that the t\ norm of 8* is known, ||0*||i = L 
(although later we will show that this assumption can be 
relaxed). We are interested in designing a sensing matrix $ 
and an estimator 8 — 0(y), such that 8* can be recovered 
with small expected t\ risk 



R[6,0*j =E* e .||0-01i, 
where the expectation is taken w.r.t. the distribution P$#* . 

4 Our choice of this observation model as opposed to a "shot-noise" 
model based on $ operating on Poisson observations of 8* is discussed in 
Appendix Ia| 



B. The proposed estimator and its performance 

To recover 8*, we will use a penalized Maximum Likelihood 
Estimation (pMLE) approach. Let us choose a convenient 1 < 
k < n/4 and take $ to be the normalized adjacency matrix 
of the expander Gk .„ (cf. Section nil for definitions): $ = 
F/d. Moreover, let us choose a finite or countable set 8l of 
candidate estimators 8 £ K™ with \\8\\i < L, and a penalty 
pen : 0£ — ¥ M+ satisfying the Kraft inequality^ 

J2 e - pcn(e) < 1. (5) 

eee L 
For instance, we can impose less penalty on sparser signals or 
construct a penalty based on any other prior knowledge about 
the underlying signal. 

With these definitions, we consider the following penalized 
maximum likelihood estimator (pMLE): 

8 = argmin [- log P$ e (y) + 2 pen(0)] (6) 

eee L 

One way to think about the procedure in (|6]l is as a Maximum 
a posteriori Probability (MAP) algorithm over the set of 
estimates 0^, where the likelihood is computed according to 
the Poisson model Q and the penalty function corresponds to 
a negative log prior on the candidate estimators in 0^. 

Our main bound on the performance of the pMLE is as 
follows: 

Theorem III.l. Let $ be the normalized adjacency matrix 
of Gk.n, let 8* £ K" be the original signal compressively 
sampled in the presence of Poisson noise, and let 8 be obtained 
through |6]). Then 



R(e,0') <4er fc (0*) 



L min [KL(P$ e . 
eee L 



■2pen(0)], (7) 



where 



KL( 



y p s ( y )io g 544 

Z^ 9\y) s p / 



is the Kullback-Leibler divergence (relative entropy) between 
P ff and V h eg. 



Proof: Since 8 £ L , we have L = \\6*\\ x > ||6>||i. 
Hence, using Proposition |H.5| with A = 0, we can write 

||0*-%<4o*(0*)+4||4(0*-0)||i. 
Taking expectations, we obtain 

R (§, r) < Aa k (8*) + 4E $e , ||$(0* - §)\\i 

<4cr fe (r)+4 V /E^,||$(c9*-fJ)||2 (8) 

where the second step uses Jensen's inequality. Using Lem- 
mas |C.1| and |C.2| in Appendix O we have 

• 0)11? < AL min [KL(P$ e . || P$ e ) + 2pen(0)] 



E 



i>0 



m 



eeBi 



5 Many penalization functions can be modified slightly (e.g., scaled ap- 
propriately) to satisfy the Kraft inequality. All that is required is a finite 
collection of estimators (i.e., 0^) and an associated prefix code for each 
candidate estimate in ©£,. For instance, this would certainly be possible for a 
total variation penalty, though the details are beyond the scope of this paper. 



Substituting this into d|), we obtain ffi), ■ 

The bound of Theorem III. 1 is an oracle inequality: it states 
that the t\ error of 9 is (up to multiplicative constants) the 
sum of the fc-term approximation error of 9* plus \/L times 
the minimum penalized relative entropy error over the set of 
candidate estimators 0^. The first term in (|7} is smaller for 
sparser 9*, and the second term is smaller when there is a 
9 £ Ql which is simultaneously a good approximation to 9* 
(in the sense that the distributions P$g» and P$# are close) 
and has a low penalty. 

Remark III.2. So far we have assumed that the £i norm 
of 9* is known a priori. If this is not the case, we can 
still estimate it with high accuracy using noisy compressive 
measurements. Observe that, since each measurement yj is a 
Poisson random variable with mean ($6**) -, ^ yj is Poisson 

with mean ||$0*||i. Therefore, t/^l/j is approximately 

normally distributed with mean « •^/||$0*||i and variance 
\ |[26] Sec. 6.2]p] Hence, Mill's inequality J27] Thm. 4.7] 



guarantees that, for every positive t, 



Pr 



/E%-v4^ 



> t 



-2f 



< 



27Ti 



where < is meant to indicate the fact that this is only an 
approximate bound, with the approximation error controlled 
by the rate of convergence in the central limit theorem. Now 
we can use the RIP-1 property of the expander graphs obtain 
the estimates 



/$>-* <ll*01i< 



and 



(l-2e) " (l-2e) ~ " Hl 

that hold with (approximate) probability at least 1 — 

(^rf)- 1 ^ 2 ' 2 . 

C. A bound in terms of l\ error 

The bound of Theorem |HI.1| is not always useful since it 
bounds the l\ risk of the pMLE in terms of the relative entropy. 
A bound purely in terms of l\ errors would be more desirable. 
However, it is not easy to obtain without imposing extra 
conditions either on 9* or on the candidate estimators in 0^. 
This follows from the fact that the divergence KL(P$#* ||P$e) 
may take the value +oo if there exists some y such that 
P* 9 fo) = 0butP* fl .(j/)>0. 

One way to eliminate this problem is to impose an additional 
requirement on the candidate estimators in 0^: There exists 
some c > 0, such that 



$9^ c, 



V6>ee L 



(9) 



Under this condition, we will now develop a risk bound for 
the pMLE purely in terms of the t\ error. 

6 This observation underlies the use of variance-stabilizing transforms. 



Theorem III.3. Suppose that all the conditions of Theo- 
rem III. 1 are satisfied. In addition, suppose that the set Ql 



satisfies the condition |9]). Then 



Rid, 6*) <4a k (6*) + ; 



L min 
eee L 



•2pen(0) 



(10) 



Proof: Using Lemma C.3 in Appendix [Cj we get the 
bound 



KL(P^. P* fl ) < - 0* - f, V0e0 L . 
c 

Substituting this into Eq. (IT), we get (jT0j». ■ 

Remark III.4. Because every 9 g 0^ satisfies \\9\\i < L, the 
constant c cannot be too large. In particular, if Q holds, then 
for every 9 £ Ql we must have 

||$0||i > mmm(§9)j > vnc. 
j 

On the other hand, by the RIP-1 property we have ||$0||i < 
ll^lli < L. Thus, a necessary condition for |9]) to hold is c < 
L/m. Since m — 0(k\og(n/k)), the best risk we may hope 
to achieve under some condition like |9| is on the order of 



R[e,e*) <4<T fc (0*) 



+ Cjmin [klog(n/k)\\6- 



\l + Lpen($)] 
(11) 



for some constant C, e.g., by choosing c oc klo i n i k \ - 
Effectively, this means that, under the positivity condition djj), 
the t\ error of 9 is the sum of the fc-term approximation 
error of 0* plus y/ra = -Jk log(n/fc) times the best penalized 
t\ approximation error. The first term in ( fTT| is smaller for 
sparser 9*, and the second term is smaller when there is a 
9 € Ql which is simultaneously a good t\ approximation to 
9* and has a low penalty. 



D. Empirical performance 

Here we present a simulation study that validates our 
method. In this experiment, compressive Poisson observations 
are collected of a randomly generated sparse signal passed 
through the sensing matrix generated from an adjacency matrix 
of an expander. We then reconstruct the signal by utilizing an 
algorithm that minimizes the objective function in |6|, and 
assess the accuracy of this estimate. We repeat this procedure 
over several trials to estimate the average performance of the 
method. 

More specifically, we generate our length-n sparse signal 9* 
through a two-step procedure. First we select fc elements of 
{1, . . . , n} uniformly at random, then we assign these elements 
an intensity /. All other components of the signal are set to 
zero. For these experiments, we chose a length n = 100,000 
and varied the sparsity k among three different choices of 
100, 500, and 1,000 for two intensity levels I of 10,000 and 
100,000. We then vary the number m of Poisson observations 
from 100 to 20,000 using an expander graph sensing matrix 
with degree d = 8. Recall that the sensing matrix is normalized 



such that the total signal intensity is divided amongst the 
measurements, hence the seemingly high choices of /. 

To reconstruct the signal, we utilize the SPIRAL-^i al- 
gorithm p8) which solves |6} when pen(#) = t||0||i. We 
design the algorithm to optimize over the continuous domain 
K™ instead of the discrete set 0^. This is equivalent to the 
proposed pMLE formulation in the limit as the discrete set of 
estimates becomes increasingly dense in the set of all 9 6 K™ 
with ||0||i < L, i.e., we quantize this set on an ever finer 
scale, increasing the bit allotment to represent each 8. In this 
high-resolution limit, the Kraft inequality requirement |5]) on 
the penalty pen(6>) will translate to / e~ pcn(9) d6> < oo. If 
we select a penalty proportional to the negative log of a prior 
probability distribution for 9, this requirement will be satisfied. 
From a Bayesian perspective, the l\ penalty arises by assuming 
each component 9i is drawn i.i.d. from a zero-mean Laplace 
prior p(9i) = e~^ 6i ^ b /2b. Hence the regularization parameter 
t is inversely related to the scale parameter b of the prior, 
as a larger r (smaller b) will promote solutions with more 
zero- valued components. 

This relaxation results in a computationally tractable convex 
program over a continuous domain, albeit implemented on a 
machine with finite precision. The SPIRAL algorithm utilizes a 
sequence of quadratic subproblems derived by using a second- 
order Taylor expansion of the Poisson log-likelihood at each 
iteration. These subproblems are made easier to solve by 
using a separable approximation whereby the second-order 
Hessian matrix is approximated by a scaled identity matrix. 
For the particular case of the £\ penalty, these subproblems 
can be solved quickly, exactly, and noniteratively by a soft- 
thresholding rule. 

After reconstruction, we assess the estimate 9 according to 
the normalized i\ error ||0*— #||i/|| 9* ||i. We select the regular- 
ization weighting r in the SPIRAL-^i algorithm to minimize 
this quantity for each randomly generated experiment indexed 
by (I,k,m). To assure that the results are not biased in our 
favor by only considering a single random experiment for 
each (/, k, to), we repeat this experiment several times. The 
averaged reconstruction accuracy over 10 trials is presented in 
Figure [2] 

These results show that the proposed method is able to 
accurately estimate sparse signals when the signal intensity 
is sufficiently high; however, the performance of the method 
degrades for lower signal strengths. More interesting is the 
behavior as we vary the number of measurements. There is 
a clear phase transition where accurate signal reconstruction 
becomes possible, however the performance gently degrades 
with the number of measurements since there is a lower 
signal-to-noise ratio per measurement. This effect is more 
pronounced at lower intensity levels, as we more quickly 
enter the regime where only a few photons are collected 
per measurement. These findings support the error bounds 
developed in Section [Ill-B| 

IV. Application: Estimating packet arrival rates 
This section describes an application of the pMLE estimator 
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Fig. 2. Average performance (as measured by the normalized £j error 
\\0* — 0||i/||0*||i) for the proposed expander-based observation method for 
recovering sparse signals under Poisson noise. In this experiment, we sweep 
over a range of measurements and 

consider a few sparsity (k) and intensity (/) levels of the true 
signal. 



packet arrival rates and instantaneous packet counts for a given 
number of streams (or flows) at a router in a communication 
network, where the arrivals of packets in each flow are 
assumed to follow a Poisson process. All packet counting 
must be done in hardware at the router, and any hardware 
implementation must strike a delicate balance between speed, 
accuracy, and cost. For instance, one could keep a dedicated 
counter for each flow, but, depending on the type of memory 
used, one could end up with an implementation that is either 
fast but expensive and unable to keep track of a large number 
of flows (e.g., using SRAMs, which have low access times, but 
are expensive and physically large) or cheap and high-density 
but slow (e.g., using DRAMs, which are cheap and small, but 
have longer access times) p9[ , p0[ . 

However, there is empirical evidence pT) , [32) that flow 
sizes in IP networks follow a power-law pattern: just a few 
flows (say, 10%) carry most of the traffic (say, 90%). Based on 
this observation, several investigators have proposed method- 
ologies for estimating flows using a small number of counters 
by either (a) keeping track only of the flows whose sizes 
exceed a given fraction of the total bandwidth (the approach 
suggestively termed "focusing on the elephants, ignoring the 
mice") p9) or (b) using sparse random graphs to aggregate the 
raw packet counts and recovering flow sizes using a message 



of Section III an indirect approach for reconstructing average 



passing decoder |30|. 

We consider an alternative to these approaches based on 
Poisson CS, assuming that the underlying Poisson rate vector 
is sparse or approximately sparse — and, in fact, it is the 
approximate sparsity of the rate vector that mathematically 
describes the power-law behavior of the average packet counts. 
The goal is to maintain a compressed summary of the process 
sample paths using a small number of counters, such that it 



is possible to reconstruct both the total number of packets in 
each flow and the underlying rate vector. Since we are dealing 
here with Poisson streams, we would like to push the metaphor 
further and say that we are "focusing on the whales, ignoring 
the minnows." 

A. Problem formulation 

We wish to monitor a large number n of packet flows 
using a much smaller number m of counters. Each flow is a 
homogeneous Poisson process (cf. |4| for details pertaining to 
Poisson processes and networking applications). Specifically, 
let A* € K™ denote the vector of rates, and let U denote 
the random process U = {Ut}t&i+ with sample paths in Z™, 
where, for each i £ {1, . . . , n}, the ith component of U is 
a homogeneous Poisson process with the rate of Aj arrivals 
per unit time, and all the component processes are mutually 
conditionally independent given A. 

The goal is to estimate the unknown rate vector A based 
on y. We will focus on performance bounds for power-law 
network traffic, i.e., for A* belonging to the class 

£a,L = {A E Rl : ||A||i = L ;<J k (X) = 0(k~ a )} (12) 

for some L > and a > 1, where the constant hidden in 
the O(-) notation may depend on L . Here, a is the power- 
law exponent that controls the tail behavior; in particular, the 
extreme regime a —} +oo describes the fully sparse setting. 
As in Section III we assume the total arrival rate ||A*||i to 
be known (and equal to a given Lq) in advance, but this 
assumption can be easily dispensed with (cf. Remark [111.21 ). 

As before, we evaluate each candidate estimator A = A(y) 
based on its expected i\ risk, 

r(x,x*) =E A .||A-A* 



B. Two estimation strategies 

We consider two estimation strategies. In both cases, we 
let our measurement matrix F be the adjacency matrix of 
the expander G k , n for a fixed k < n/4 (see Section III] for 
definitions). The first strategy, which we call the direct method, 
uses standard expander-based CS to construct an estimate of 
A*. The second is the pMLE strategy, which relies on the 



machinery presented in Section III and can be used when only 
the rates are of interest. 

1) The direct method: In this method, which will be used 
as a "baseline" for assessing the performance of the pMLE, 
the counters are updated in discrete time, every r time units. 
Let x — {x„}„gz + denote the sampled version of U, where 
X v = U VT . The update takes place as follows. We have a binary 
matrix F 6 {0, l} mx, \ and at each time v let y v — Fx v . In 
other words, y is obtained by passing a sampled n-dimensional 
homogeneous Poisson process with rate vector A through a 
linear transformation F. 

The direct method uses expander-based CS to obtain an 
estimate x v of x v from y v — Fx u , followed by letting 



A? ir 



X y 

VT 



(13) 



This strategy is based on the observation that x u /(vt) is the 
maximum-likelihood estimator of A*. To obtain x v , we need 
to solve the convex program 



minimize \\u\\i 



subject to Fu = y v 



which can be cast as a linear program [ 33 1 . The resulting 
solution x v may have negative coordinatesrl hence the use of 
the (•)+ operation in (jT3j. We then have the following result: 



Theorem IV.l. 



R 



(Af,A*) <4a fc (A*) 



|(A*) 



1/21 



where (A*) 1 / 2 is the vector with components ^ l /Xf,Wi. 



(14) 



Remark IV.2. Note that the error term in ( fT4] > is 0(l/^/v), 
assuming everything else is kept constant, which coincides 
with the optimal rate of the l\ error decay in parametric 
estimation problems. 

Proof: We first observe that, by construction, x v satisfies 
the relations Fx v = Fx v and ||:c„||i < ||av||i. Hence, 

Ep„ - ^rA*||i < Ep„ - x v \ x +E||»„ - fTA*||i 

<4E<7 fc (a;„)+E||a: I/ -i/TA*||i (15) 

where the first step uses the triangle inequality, while the 
second step uses Proposition |II.5| with A = 0. To bound the 
first term in (JT3J, let S C {1, . . . , n} denote the positions of 
the k largest entries of A*. Then, by definition of the best 
fc-term representation, 



<Jk{x v ) < \\x u -*w||i 
Therefore, 



ieS" 



/ Xi/^. 



Ea k (x v ) <E 



Ues c 



= VT S ^ X* = VT(J k {X* 



i£S" 



To bound the second term, we can use concavity of the square 
root, as well as the fact that each x v ^ ~ Poisson(j/rA*), to 
write 



Ella;,, -z/rA* ||i =E 



E 



" N 

^2\x n ,i -vtX*\ 
r n 







< J2 M^ - ^rA*) 2 = E v^A* 



! = 1 



Now, it is not hard to show that \\x~£ — vtX*\\i < \\x,, 
vtX*\U. Therefore, 



i?(A^ ir ,A*) < 



E||a;„ — i^r A* ||i 



<4a fc (A*) + 



\\(y) 1/2 h 



which proves the theorem. 



7 Khajehnejad et al. [34] have recently proposed the use of perturbed 
adjacency matrices of expanders to recover nonnegative sparse signals. 



2) The penalized MLE approach: In the penalized MLE ap- 
proach the counters are updated in a slightly different manner. 
Here the counters are still updated in discrete time, every r 
time units; however, each counter i G {1, • • • , m} is updated at 
times (i/T + j^t) , and only aggregates the packets that 

have arrived during the time period \ut + ^—^t, vt + ^r). 
Therefore, in contrast to the direct method, here each arriving 
packet is registered by at most one counter. Furthermore, 
since the packets arrive according to a homogeneous Poisson 
process, conditioned on the vector A*, the values measured 
by distinct counters are independent] Therefore, the vector of 
counts at time v obeys 

y v — Poisson($6>*) where 6* = A* 

TO 

which is precisely the sensing model we have analyzed in 
Section [Hi] 

Now assume that the total average arrival rate ||A*||i = Lo 
is known. Let A be a finite or a countable set of candidate 
estimators with ||A||i < Lq for all A £ A, and let pen(-) 
be a penalty functional satisfying the Kraft inequality over A. 
Given v and r, consider the scaled set 



K,T 



vrd 



A 



vrd , 



-A: Ae A 



with the same penalty function, pen(^^A) = pen(A) for 
all A <E A. We can now apply the results of Section [IH] 
Specifically, let 

^pMLE A m " 
vrd' 

where 9 is the corresponding pMLE estimator obtained ac- 
cording to d6j. The following theorem is a consequence of 
Theorem |III.3| and the remark following it: 

Theorem IV.3. If the set A satisfies the strict positivity 
condition Q, then there exists some absolute constant C > 0, 
such that 



R 



PPMLE^ 



A* <4<r fc (A*) 



■Ca 



' mm 

AGA 



fclog(n/fc)||A-A*||f + 



k Lq pen(A) 



(16) 

We now develop risk bounds under the power-law condition. 
To this end, let us suppose that A* is a member of the power- 
law class Slo.q defined in ( |12) , Fix a small positive number 
S, such that Lq/VS is an integer, and define the set 



A 



{A G R? : ||A||i < L ;Xi G {sVs}^ ,^} 



These will be our candidate estimators of A*. We can define the 
penalty function pen(A) x ||A||olog(<5 _1 ). For any A G £ q ,l 
and any 1 < r < n we can find some A < - r - ) G A, such that 
||AM|| xrand 



||A-AW||; 



-2<v 



The independence follows from the fact that if X\, ■ ■ ■ , X m are con- 
ditionally independent random variables, then for any choice of functions 
91 1 ' ' ' 1 9m , the random variables g\ (Xi ) , • • • , g m (X m ) are also condi- 
tionally independent. 



Here we assume that 6 is sufficiently small, so that the 

penalty term r ° s ^ ' dominates the quantization error r S. 

In order to guarantee that the penalty function satisfies Kraft's 
inequality, we need to ensure that 



r=l A (r) gA 



< 1. 



For every fixed r, there are exactly (") subspaces of dimension 
r, and each subspace contains exactly I ^% ) distinct elements 
of A. Therefore, as long as 



S< (2nL y 



(17) 



then 



n / \ n n i 

E(:)(^) r <£(-W^<£^<i, 



r=0 



and Kraft's inequality is satisfied. 

Using the fact that fclog(n/fc) = O(kd), we can bound the 
minimum over A G A in ( fT6"| > from above by 



nun 

Kr<n 



kdr 



rk\og(S 1 ) 



= oikd^^ 6 ^V a+1 



o 



(kd^+~A 



VT 

logn^ 2c ' 



We can now particularize Theorem |IV. 3 1 to the power-law case: 
Theorem IV.4. 

■ftpMLE y 



sup R A? 1 

= 0(k- a ) + O (ki d^+A 



log n V- ' 

VT 



where the constants implicit in the 0() notation depend on 
Lq and a. 

Note that the risk bound here is slightly worse than the 
benchmark bound of Theorem |I V. 1 1 However, it should be 
borne in mind that this bound is based on Theorem |III.3| 
rather than on the potentially much tighter oracle inequality 
of Theorem |IH. 1| since our goal was to express the risk of 
the pMLE purely in terms of the l\ approximation properties 
of the power-law class E a ^ . In general, we will expect the 
actual risk of the pMLE to be much lower than what the 



conservative bound of Theorem IV.4 predicts. Indeed, as we 



will see in Section IV-D the pMLE approach obtains higher 
empirical accuracy than the direct method. But first we show 
how the pMLE can be approximated efficiently with proper 
preprocessing of the observed counts y„ based on the structure 
of Gfe,„. 



C. Efficient pMLE approximation 

In this section we present an efficient algorithm for approx- 
imating the pMLE estimate. The algorithm consists of two 
phases: (1) first, we preprocess y v to isolate a subset A\ of 
A = {1, . . . ,n} which is sufficiently small and is guaranteed 
to contain the locations of the k largest entries of A* (the 
whales); (2) then we construct a set A of candidate estimators 
whose support sets lie in A\, together with an appropriate 
penalty, and perform pMLE over this reduced set. 

The success of this approach hinges on the assumption 
that the magnitude of the smallest whale is sufficiently large 
compared to the magnitude of the largest minnow. Specifically, 
we make the following assumption: Let S C A contain the 
locations of the k largest coordinates of A*. Then we require 
that 



minA* >9D 



A* -A 



,(fe) 



(18) 



Recall that D = O (^) = O (f ) is the right degree of the 
expander graph. One way to think about ( fT~8] > is in terms of a 
signal-to-noise ratio, which must be strictly larger than 9D. 
We also require vr to be sufficiently large, so that 



D 



A* -X 



:(fe) 



> 



log (mn) 



(19) 



Finally, we perturb our expander a bit as follows: choose an 
integer k' > so that 



k > max 



ri6(fcd + i) 



15d 



,2k 



(20) 



Then we replace our original (2k, 1/16) -expander G^ „ with 
left-degree d with a (k 1 , l/16)-expander G' k , with the same 
left degree. The resulting procedure, displayed below as Algo- 
rithm [T] has the following guarantees: 

Algorithm 1 Efficient pMLE approximation algorithm 
Input: Measurement vector y v , and the sensing matrix F. 
Output: An approximation A 

Let B\ consist of the locations of the kd largest elements 

of y v and let B 2 = B\B 1 . 

Let A2 contain the set of all variable nodes that have at 

least one neighbor in B 2 and let A\ — A\A 2 . 

Construct a candidate set of estimators A with support in 

A\ and a penalty pen(-) over A. 

Output the pMLE A. 



Theorem IV.5. Suppose the assumptions ( fl"8| ), (19) , and 
( p0| > hold. Then with probability at least 1 — A the set A\ 
constructed by Algorithm [T] has the following properties: (1) 
S C Ai\ (2) I Ax I < kd; (3) A 1 can be found in time 

O (to log to + nd). 

Proof: (1) First fix a measurement node j G B. Recall 
that y v> j is a Poisson random variable with mean ^ (FA*).-. 

'y V j is approx- 

^ (FA*),, and 



By the same argument as in Remark III.2 
imately normally distributed with mean « 



with variance ss |. Hence, it follows from Mill's inequality 
and the union bound that for every positive t 

3j 



Pr 



(FA* 



>t 



< 



2-nt 



If j is a neighbor of S, then (FA*) > min^s A*; whereas 
if j is not connected to S, then (FA*) < D X* - X 



log (mn) 



,(*o 



(where w.l.o.g. we assume 



Hence, by setting t = 

that t > 1), we conclude that, with probability at least 1 — — , 

for every measurement node j the following holds: 

• If j is a neighbor of S, then 



VT 

ly v ,i >\ — mm A* 



If j is not connected to S, then 



log (mn) 
~2 ' 



-D 



x *_ x *(k) 



log (mn) 



Consequently, by virtue of ( fj"8| i and ( |19) , with probability 
at least 1 — - every element of y v that is a neighbor of S 
has larger magnitude than every element of y v that is not a 
neighbor of S. 

(2) Suppose, to the contrary, that |Ai| > kd. Let A[ C A\ 
be any subset of size kd + 1. Now, Lemma 3.6 in (34) states 
that, provided e < 1 — 1/d, then every (£, e)-expander with left 
degree d is also a (£(l—e)d, 1 — l/d)-expander with left degree 
d. We apply this result to our (k' , l/16)-expander, where k' 
satisfies ( f20] >, to see that it is also a (kd+1, 1 — l/<i)-expander. 
Therefore, for the set A[ we must have |A/"(>l'i)| > \A[\ = 
kd+ 1, On the other hand, Af(A[) C B u so |A/"(Ai)| < kd. 
This is a contradiction, hence we must have |Ai| < kd. 

(3) Finding the sets B\ and B 2 can be done in 0(m log m) 
time by sorting y v . The set A\ can then can be found in time 
0(nd), by sequentially eliminating all nodes connected to each 
node in B 2 . ■ 

Having identified the set A\, we can reduce the pMLE 
optimization only to those candidates whose support sets lie 
in Ay. More precisely, if we originally start with a sufficiently 
rich class of estimators A, then the new feasible set can be 
reduced to 



A = 



JAe A:Su PP (A)c Ax} 



r 



Hence, by extracting the set A\, we can significantly reduce 
the complexity of finding the pMLE estimate. If |A| is small, 
the optimization can be performed by brute-force search in 
0(| A|) time. Otherwise, since |Ai| < kd, we can use the 
quantization technique from the preceding section with quan- 
tizer resolution VS to construct a A of size at most (Lo/vS) kd . 
In this case, we can even assign the uniform penalty 

pen(A) = log |A| =0(k log(n/k) log^ 1 )) , 

which amounts to a vanilla MLE over A. 
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Fig. 3. Relative t\ error as a function of number of whales k, for ^i-magic (LP), SSMP and pMLE for different choices of the power-law exponent a. The 
number of flows n = 5000, the number of counters m = 800, and the number of updates is 40. 
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Fig. 4. Probability of successful support recovery as a function of number of whales k, for l\ -magic (LP), SSMP and pMLE for different choices of the 
power-law exponent a. The number of flows n = 5000, the number of counters m = 800, and the number of updates is 40. 



D. Empirical performance 

Here we compare penalized MLE with £i-magic p5| , a 
universal t\ minimization method, and with SSMP [36], an 
alternative method that employs combinatorial optimization. 
£i-magic and SSMP both compute the "direct" estimator. The 
pMLE estimate is computed using Algorithm [T] above. For the 
ease of computation, the candidate set A is approximated by 
the convex set of all positive vectors with bounded £\ norm, 
and the CVX package (37), (38) is used to directly solve the 
pMLE objective function with pcn((9) = ||0||i. 

Figures 3(a) through 5(b)| report the results of numerical 
experiments, where the goal is to identify the k largest entries 
in the rate vector from the measured data. Since a random 
graph is, with overwhelming probability, an expander graph, 
each experiment was repeated 30 times using independent 
sparse random graphs with d = 8. 

We also used the following process to generate the rate 
vector. First, given the power-law exponent a, the magnitudes 
of the k whales were chosen according to a power-law distri- 
bution with parameter a. The positions of the k whales were 
then chosen uniformly at random. Finally the n — k minnows 
were sampled independently from a Af(Q, 10~ 6 ) distribution 
(negative samples were replaced by their absolute values). 
Thus, given the locations of the k whales, their magnitudes 



decay according to a truncated power law (with the cut- 
off at k), while the magnitudes of the minnows represent 
a noisy background. Figure [5] shows the relative l\ error 
(|| A — A„||i/||A||i) of the three above algorithms as a function 
of k. Note that in all cases a — 1, a = 1.5, and a = 2, the 
pMLE algorithm provides lower t\ errors. Similarly, Figure [4] 
reports the probability of exact recovery as a function of k. 
Again, it turns out that in all three cases the pMLE algorithm 
has higher probability of exact support recovery compared to 
the two direct algorithms. 

We also analyzed the impact of changing the number of 
updates on the accuracy of the three above algorithms. The 
results are demonstrated in Figure B] Here we fixed the number 
of whales to k = 30, and changed the number of updates 
from 10 to 200. It turned out that as the number of updates v 
increases, the relative t\ errors of all three algorithms decrease 
and their probability of exact support recovery consistently 
increase. Moreover, the pMLE algorithm always outperforms 
the £i-magic (LP), and SSMP algorithms. 

V. Conclusions 

In this paper we investigated expander-based sensing as 
an alternative to dense random sensing in the presence of 
Poisson noise. Even though the Poisson model is essential in 
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Fig. 5. Performance of £i -magic, SSMP and pMLE algorithms as a function 
of the number of updates v. The number of flows n = 5000, the number of 
counters m = 800, and the number of whales is k = 30. There are k whales 
whose magnitudes are assigned according to a power-law distribution with 
a = 1, and the remaining entries are minnows with magnitudes determined 
by a A/"(0, 10 — 6 ) random variable. 



some applications, it presents several challenges as the noise 
is not bounded, or even as concentrated as Gaussian noise, 
and is signal-dependent. Here we proposed using normalized 
adjacency matrices of expander graphs as an alternative con- 
struction of sensing matrices, and we showed that the binary 
nature and the RIP-1 property of these matrices yield provable 
consistency for a MAP reconstruction algorithm. 

The compressed sensing algorithms based on Poisson obser- 
vations and expander-graph sensing matrices provide a useful 
mechanism for accurately and robustly estimating a collection 
of flow rates with relatively few counters. These techniques 
have the potential to significantly reduce the cost of hardware 
required for flow rate estimation. While previous approaches 
assumed packet counts matched the flow rates exactly or that 
flow rates were i.i.d., the approach in this paper accounts 
for the Poisson nature of packet counts with relatively mild 



assumptions about the underlying flow rates (i.e., that only a 
small fraction of them are large). 

The "direct" estimation method (in which first the vector of 
flow counts is estimated using a linear program, and then the 
underlying flow rates are estimated using Poisson maximum 
likelihood) is juxtaposed with an "indirect" method (in which 
the flow rates are estimated in one pass from the compressive 
Poisson measurements using penalized likelihood estimation). 

The methods in this paper, along with related results in 
this area, are designed for settings in which the flow rates are 
sufficiently stationary, so that they can be accurately estimated 
in a fixed time window. Future directions include extending 
these approaches to a more realistic setting in which the flow 
rates evolve over time. In this case, the time window over 
which packets should be counted may be relatively short, but 
this can be mitigated by exploiting estimates of the flow rates 
in earlier time windows. 

Appendix A 
Observation models in Poisson inverse problems 

In Q and all the subsequent analysis in this paper, we 

assume 

y ~ Poisson(<I>6>*). 

However, one might question how accurately this models the 
physical systems of interest, such as a photon-limited imaging 
system or a router. In particular, we may prefer to think of 
only a small number of events (e.g., photons or packets) 
being incident upon our system, and the system then rerouting 
those events to a detector. In this appendix, we compare the 
statistical properties of these two models. Let z* ; i denote the 
number of events traveling from location i in the source (8*) to 
location j on the detector. Also, in this appendix let us assume 
$ is a stochastic matrix, i.e., each column of $ sums to one; 
in general, most elements of $ are going to be less than one. 
Physically, this assumption means that every event incident 
on the system hits some element of the detector array. Armed 
with these assumptions, we can think of $j $ as the probability 
of events from location i in 6* being transmitted to location 
j in the observation vector y. 

We consider two observation models: 



Model A: 



Model B: 



Zj t i ~ Poisson($ ;/] jf?*) 



y.i 



7 , z i' i 



w ~ Poisson(6>*) 
{Zj,i}i=i ~ Multinomial (w t , {%i}?=i) 



where in both models all the components zji of z are mutually 
conditionally independent given the appropriate parameters. 
Model A roughly corresponds to the model we consider 
throughout the paper; Model B corresponds to considering 
Poisson realizations with intensity 9* (denoted w) incident 
upon our system and then redirected to different detector 
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elements via <&. We model this redirection process with a 
multinomial distribution. While the model y ~ Poisson(<i>6>*) 
is slightly different from Model A, the following analysis will 
provide valuable insight into discrete event counting systems. 
We now show that the distribution of z is the same in 
Models A and B. First note that 



y.i 



£ 



£ 



and Wi 

Under Model A, we have 

n m _*. .fl*/i n*\z- ■ 



(21) 



i=l J=l 



z- ■' 






n n 






n n 

»=1 \3'=1 



e"^ (0*)* 



(22) 



where in the last step we used pTj ) and the assumption that 
E"li $i,i = !■ Under Model B, we have 



$ 



P(2|w) 



n w - n t^t' if £ **< = w * yi 



p(w\e*) = Yl 



»=1 3=1 

0, 



otherwise 






WI, 



j>(*|0*) = ^ P(«|«)P(«|^) 



nn 

i=lj=l 



?(' 



$ • '* e" 



*?,*• 






nn 



°jl«- 



<\WJ; 



(23) 



to Sipser and Spielman [21], we have the following chain of 
inequalities: 

\\Fy\\x > \\hh 

>\\Fvs\\i-JZ £ IWI 

>d{l-2e)\\y s \\i-Y J Y, , 

»=1 (j,l)eE:j£Si,l£jV(S) 

I 



vsi-ih 



>d{l-2e)\\y s \\ x -2kdeY J 



! = 1 



lySi-illl 



>d(l- 2e)||y s ||i -2de||»||i. 

Most of the steps are straightforward consequences of the 
definitions, the triangle inequality, or the RIP-1 property. The 
fourth inequality follows from the following fact. Since we are 
dealing with a (2k, e)-expander and since |5U5j| < 2k for ev- 
ery i = 0, ..., t, we must have \Af(SUSi)\ > d(l-e)\SUSi\. 
Therefore, at most 2kde edges can cross from each Si to 
Af(S). From the above estimate, we obtain 



| J Fu-i^||i + 2de||y||i> (1 - 2e)d||y s || 1 . 



(24) 



The fourth line uses ( f2T| . Since (|22]» and ( |2"3"j ) are the same, we 
have shown that Models A and B are statistically equivalent. 
While Model B may be more intuitively appealing based on 
our physical understanding of how these systems operate, 
using Model A for our analysis and algorithm development 
is just as accurate and mathematically more direct. 



Using the assumption that ||u||i > \\v\\i — A, the triangle 
inequality, and the fact that ||usc||i = Ufc(u), we obtain 

||«||i> ||«||i-A 

u-y\\i -A 

(«-v)5||i + ll(«-y)s-||i-A 
- Huslli - \\ysh + \\us°h - \\ys°h - A 

u||i-2||u s =||i + ||y||i-2||ys||i-A 
u||i - 2a k {u) + \\y\\i - 2\\y s \\i - A, 

which yields 

Hi/Hi <2a k (u)+2\\y s \\i+A. 

Using ( p4] > to bound ||ys||i, we further obtain 

„ , N 2||Fu-Fv||i-|-4de||y||i 

||y||i < 2o*(«) + J (1 _" 2e)d +A. 

Rearranging this inequality completes the proof. 



Appendix B 
Proof of Proposition III.5I 

Let y = u— v, let S C {1, . . . , n} denote the positions of the 
k largest (in magnitude) coordinates of y, and enumerate the 
complementary set S c as ii,l2, . . . , i n -k m decreasing order 
of magnitude of |j/j. \,j = 1, . . . ,n — k. Let us partition the 
set S c into adjacent blocks Si, . . . , St, such that all blocks 
(but possibly St) have size k. Also let So = S. Let F be a 
submatrix of F containing rows from N(S). Then, following 
the argument of Berinde et al. [18], which also goes back 



Appendix C 
Technical lemmas 

Lemma C.l. Any 9 e 8l satisfies the bound 

m 



Proof: From Lemma |II.4| it follows that 

||*0||i<||0||i<£. voee L . (25) 
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Let (3* = $6* and /3 = W. Then 



into three terms: 



i/?*-/?ii 2 = Ei^-^- 
Wi 

/ m 



E$g» 



1/2 _ fl l/2 



v i=l 



pt+pi' 2 



< 



EK /2 -^ 1/2 | -K 1/2 +^ 



*j=i 



2E 



log 



log 



P»o*(y) 
P*sfo) 



2pen(#) 



>/p*«(v)e- 


-pon(e) 




" 


l <v/P*e*(l/)E<w 


/ p*»(v) 

V P*e*(y) 



< 2 E|rr-/r 2 | .ei/^+^I 

i=i j=i 

= 2Y J \p*] /2 -P] ,2 \ -(11/3111 + 11/3111) 

m „ 



We show that the third term is always nonpositive, which 
completes the proof. Using Jensen's inequality, 



"1/2 _ nl/2 



1=1 



4i£k*r 



a/2 



(*/).• 



1/2 



i=l 



IE 



log 



V^y)^ pcn{§) 











VP*e*(y)E$e* 



The first and the second inequalities are by Cauchy-Schwarz, 
while the third inequality is a consequence of Eq. d25}, ■ 

Lemma C.2. Let 6* be a minimizer in Eq. |6]). Then 



<log 



E 



l*e* 



^|($n 1/2 -($^) 



1/2 



U=l 

min [KL(P$ e . 



yP~^Je-P°"( e ) 



v/P* fl . (j/)E* e . 



i"W(y) 



2pen(0)]. (26) Now 



Proof: Using Lemma C.4 below with g = $0* and h = 
$6* we have 



E 



<I>6>* 



EK)< 1/a -(**)« 1/a 

2 log 



E 



E$e* 



/ y/P^yW^MMv) 



V^§(v>- 


-pon(e) 




p 1 


VP*e*(y)E$ e . 


/ P*«(W) 

V p *e* (y) 



< ^ e -Pcn(9) < L 

fleer, 



Clearly 



Since E$#» 



y / P $e ,(y)P^( 2 /)^(y) = E $e , 



<I>0 



s(v) 



P$e*(y) 



log (* 



dil 



P*s(y) 



KL(P$ e . llP^a), we obtain 



We now provide a bound for this expectation. Let be a 
minimizer of KL(P$ e . ||P$ e ) + 2pen(0) over € 6 L . Then, 
by definition of 9, we have 



/ P«j(v)e- p€nW > ^/P #e -(y)e- pen(9) 
for every y. Consequently, 

1 yP^)e- pcn( ^ 



E 



$e* 



P*o*(2/) 



< 



V^P^Je-P^^E.fl. 



We can split the quantity 

/ 

•2E* fl . log 



v / P^) e_pen( ^ ) 



yp~(y)e- pcn(e) E 



$e* 



P^g(y) 

P*e*(y) 



ggjfa) 

p*o* (y) 

\ 
/ 



.4=1 

<KL(P M .||P^) + 2pen(fl) 

= min [KL(P <&e «||P tE , e ) + 2pen(6')l 

8g0l 



which proves the lemma. ■ 

Lemma C.3. If the estimators in 0j, satisfy the condition |9|, 
then following inequality holds: 



KL(P $e , ||P M )< -||. 
c 



we e e L . 
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Proof: By definition of the KL divergence, 

KL(P$ e . 



— E<S>0* 



log 



*e 



(y) 



P*efo) 



= E E (*«*)i 



j=i 



E 
3=1 



j/j 



loe 



(**); 



J=l 



($0*)>g 



(gg!)j 



< 



(**)j 



3=1 

m 1 

= E^K-^- 

<i||*(0*-0)||l 

< I||$(e*_0)||2< I||r 

c c 



- ($0*),- + ($9) 3 



-1 -Wj + Mi 



The first inequality uses log t < t — 1, the second is by ((9), 
the third uses the fact that the t\ norm dominates the £2 norm, 
and the last one is by the RIP-1 property (Lemma |ll.4|i. ■ 



Lemma C.4. Given two Poisson parameter vectors g,h e 
the following equality holds: 



2 log- 



1 



/ y/F g (y)F h (y)d[i(y) £J 
where /1 denotes the counting measure on 
Proof: 



9/ -h 



1/2 _ ,,1/2 



F g (y)¥ h (y)dfi(y) 



y 

m 00 



teM!!„-te+y/2 



IIE „.. 



=ne 

i=i 



E 

w=o 



y, ! 



J=l V „ < 



=1 



-ite /a -fcn 



j=i 

Taking logs, we obtain the lemma. ■ 
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